JT's IBM Data Science Capstone Assignment:

Red and Blue Gerrymandering

What determines (alleged) Georgia gerrymandering?

(image courtesy of economicmodeling.com)

Introduction/Problem Summary:

Amongst the issues affecting election campaigning, Gerrymandering is one of the most vexing and arcane. Given the contentious nature of this topic in Georgia (USA) and the close results of recent elections, it is important that political campaign firms (as well as voters) understand the effects of this issue in order to better target candidate marketing in this state during the next election season.

Gerrymandering is a practice intended to establish an unfair political advantage for a particular party or group by manipulating election district boundaries. "Districts" define geographical boundaries, with each district within a state being geographically contiguous and having about the same number of state voters.

In recent years, districting policies in Georgia, USA have been hotly debated recently, particularly during the 2018 gubernatorial election run-off between Stacy Abrahms and current Governor Brian Kemp. Accusations of a ‘rigged process’ were rife, as redistricting often resulted in varied and interesting “geographically-contiguous” shapes:

SOUTH%20CAROLINA.png

SOUTH%20CAROLINA.png

Greenville.png

Given that personal policital ideologies have shifted over time in given locations, understanding this phenomena is essential. We will be evaluating the demographics in Georgia including contrasting Georgian ‘Red' districts (Republicans) with ‘Blue' districts (Democrats) to see what comprises each type. These observations will inform investments campaign firms should consider to combat the negative effect of gerrymandering on candidate success.

Caveats: Please note that this is exploratory analysis (in the loosest sense of the word); my results and observations could mislead at a time where accurate information ("truth") is under stress. Moreover, to conduct such analysis properly, I would need access to more data (eg, cuts of information by year pre and post redistricting, more granular income distribution and education reporting); such data is currently not freely available.

-----END OF INTRODUCTION SECTION-----

START OF DATA SECTION:

Method and Data Requirements:

I will review certain characteristics of "red" (Republican) and "blue" (Democrat) districts:

  • population voting history
  • education
  • age
  • local amenities (venue categories)
  • poverty level

These features will be plugged into a kmeans clustering which will hopefully present some interesting groupings. I will use the following data sources to retrieve the above:

https://ballotpedia.org/Redistricting_in_Georgia - congressional districts by number, current representative by full name, and current party affiliation as well as term, election victoty margins, district ethnic demographics. Information is conveyed in several tables included in this one webpage. Samples: image.png image.png

https://www2.census.gov/programs-surveys/demo/tables/voting/table01.xlsx - "Number of Votes Cast, Citizen Voting-Age Population and Voting Rates for Congressional Districts: 2018" Sample: image.png

https://www2.census.gov/programs-surveys/demo/tables/voting/table02a.xlsx - "Characteristics (Age) of the Citizen Voting-Age Population for Congressional Districts: 2018" Sample: image.png

https://www2.census.gov/programs-surveys/demo/tables/voting/table02c.xlsx - "Characteristics (Educational Attainment) of the Citizen Voting-Age Population for Congressional Districts: 2018" Sample: image.png

https://www2.census.gov/programs-surveys/demo/tables/voting/table02b.xlsx - "Characteristics (Sex and Poverty) of the Citizen Voting-Age Population for Congressional Districts: 2018" Sample: image.png

https://developer.foursquare.com/docs/build-with-foursquare/categories - And of course, FourSquare data for venue categories, with locations pulled from geopy (if it cooperates for me). Sample: image.png

-----END OF DATA SECTION-----

In [ ]: